Resampling Effects on Significance Analysis of Network Clustering and Ranking

نویسندگان

  • Atieh Mirshahvalad
  • Olivier H. Beauchesne
  • Éric Archambault
  • Martin Rosvall
چکیده

Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984-2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-parametric resampling of random walks for spectral network clustering

Parametric resampling schemes have been recently introduced in complex network analysis with the aim of assessing the statistical significance of graph clustering and the robustness of community partitions. We propose here a method to replicate structural features of complex networks based on the non-parametric resampling of the transition matrix associated with an unbiased random walk on the g...

متن کامل

Measuring the efficiency of a three-stage network using data envelopment analysis approach considering dual boundary

This paper presents a method for performance evaluation, ranking and clustering based on the double-frontier view to analyze the complex networks. The model allows us to open the structure of the “black box” and can help to obtain important information about efficient and inefficient points of the system. In this paper, we consider a three-stage network, in respect to the additional desirable a...

متن کامل

A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...

متن کامل

Ranking Network-Structured Decision-Making Units and Its Application in Bank ‎Branches

Data envelopment analysis (DEA) is a method used for measuring the efficiency of decision-making units‎. Unlike the standard models‎, ‎which assume decision-making units to be a black box‎, ‎network data envelopment analysis focuses on the internal structure of these units‎. ‎Some researchers have developed a two-stage method where all the inputs are entirely used in the first stage‎, ‎producin...

متن کامل

A Fully Fuzzy Method of Network Data Envelopment Analysis for Assessing Revenue Efficiency Based on Ranking Functions

The purpose of this paper is to evaluate the revenue efficiency in the fuzzy network data envelopment analysis‎. ‎Precision measurements in real-world data are not practically possible‎, ‎so assuming that data is crisp in solving problems is not a valid assumption‎. ‎One way to deal with imprecise data is fuzzy data‎. ‎In this paper‎, ‎linear ranking functions are used to transform the full fuz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013